REPORT ON HOW TO DEPLOYE AI/ML CORONA CASE INTO THE REAL WORLD.

Phase 2 Provision

( This version may include code of Mark was getting support Ideas and experience knowledge during our solution 2nd winner hackathon of FruitBunchAI in TU/e Artificial Intelligent student association event) with a teammate about corona's mental health price.

The code got support from IBM and other friends with strength in the data science field. To convince the audience not to be confused.

I am a student and not a researcher. The work here might be on Github; therefore * this version is our data study project. No confusion in regulation and aim for changing the government rule.

The reader can see the topic:

https://bmrheijligers.medium.com/coronacalm-hackingmentalhealth-749d4a4e7af0

As the genuine challenge not decide to do Facial Recognition, and this version aim for Corona Measure.

  • In this notebook, we will take deep into Covid Analytic for each region and how the case is active working. The data set is from data.rivm.nl Netherland, and substantial truth and high reputation. In the progression of how the government records covid, it is not highly precise due to the large number of people who haven't show or declare a test with Covid.

Goal:

  • Understand covid measure and analytic of issues.
  • Giving AI to predict the best measure method for NL.
In [1]:
import pandas as pd
import numpy  as np
import matplotlib
from pyearth import Earth
from pyearth import export
import matplotlib.pyplot as plt 
#from jupyterthemes import jtplot
#jtplot.style(theme='onedork')

%matplotlib inline
np.warnings.filterwarnings('ignore')

pd.__version__
# np.__version__
print('matplotlib: {}'. format(matplotlib. __version__))
matplotlib: 3.3.4

Raw Data Cleaning and processing.

In [2]:
# !wget -N https://data.rivm.nl/covid-19/COVID-19_casus_landelijk.json
# Pyspark support with big data and easy to show the frame. 
from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()  
In [3]:
df4 = spark.read.options(header='True', inferSchema='True', delimiter=';').csv('/home/markn/my_project/corona/COVID-19_casus_landelijk.csv')
df4.show(10)
+-------------------+---------------+--------------------+--------+------+-------------+------------------+--------+-------------+------------------------+
|          Date_file|Date_statistics|Date_statistics_type|Agegroup|   Sex|     Province|Hospital_admission|Deceased|Week_of_death|Municipal_health_service|
+-------------------+---------------+--------------------+--------+------+-------------+------------------+--------+-------------+------------------------+
|2021-03-23 10:00:00|     2020-01-01|                 DOO|   40-49|Female|Noord-Holland|                No|      No|         null|           GGD Amsterdam|
|2021-03-23 10:00:00|     2020-01-01|                 DOO|   50-59|  Male|   Gelderland|                No|      No|         null|    Veiligheids- en G...|
|2021-03-23 10:00:00|     2020-01-01|                 DOO|   20-29|Female| Zuid-Holland|                No|      No|         null|     GGD Hollands-Midden|
|2021-03-23 10:00:00|     2020-01-01|                 DOO|   60-69|Female|Noord-Holland|                No|      No|         null|    GGD Hollands-Noorden|
|2021-03-23 10:00:00|     2020-01-04|                 DOO|   10-19|Female|   Gelderland|           Unknown|      No|         null|     GGD Gelderland-Zuid|
|2021-03-23 10:00:00|     2020-01-06|                 DOO|   30-39|  Male|      Limburg|           Unknown| Unknown|         null|        GGD Zuid-Limburg|
|2021-03-23 10:00:00|     2020-01-16|                 DOO|     0-9|Female| Zuid-Holland|                No|      No|         null|    GGD Rotterdam-Rij...|
|2021-03-23 10:00:00|     2020-01-20|                 DOO|   50-59|Female|   Gelderland|                No|      No|         null|     GGD Gelderland-Zuid|
|2021-03-23 10:00:00|     2020-01-20|                 DOO|     0-9|  Male|   Gelderland|                No|      No|         null|     GGD Gelderland-Zuid|
|2021-03-23 10:00:00|     2020-01-22|                 DOO|   80-89|Female| Zuid-Holland|           Unknown| Unknown|         null|     GGD Hollands-Midden|
+-------------------+---------------+--------------------+--------+------+-------------+------------------+--------+-------------+------------------------+
only showing top 10 rows

In [4]:
df4.printSchema()
root
 |-- Date_file: string (nullable = true)
 |-- Date_statistics: string (nullable = true)
 |-- Date_statistics_type: string (nullable = true)
 |-- Agegroup: string (nullable = true)
 |-- Sex: string (nullable = true)
 |-- Province: string (nullable = true)
 |-- Hospital_admission: string (nullable = true)
 |-- Deceased: string (nullable = true)
 |-- Week_of_death: integer (nullable = true)
 |-- Municipal_health_service: string (nullable = true)

After checking the data set collum we have to know exactly where is the province of Netherlands.

In [5]:
df4.select("Province").distinct().show(truncate=False)
+-------------+
|Province     |
+-------------+
|Overijssel   |
|Flevoland    |
|Zeeland      |
|Noord-Brabant|
|Fryslân      |
|Noord-Holland|
|Gelderland   |
|Utrecht      |
|Limburg      |
|Drenthe      |
|Zuid-Holland |
|Groningen    |
+-------------+

In [6]:
df = df4.toPandas()
df.head(10)
Out[6]:
Date_file Date_statistics Date_statistics_type Agegroup Sex Province Hospital_admission Deceased Week_of_death Municipal_health_service
0 2021-03-23 10:00:00 2020-01-01 DOO 40-49 Female Noord-Holland No No NaN GGD Amsterdam
1 2021-03-23 10:00:00 2020-01-01 DOO 50-59 Male Gelderland No No NaN Veiligheids- en Gezondheidsregio Gelderland-Mi...
2 2021-03-23 10:00:00 2020-01-01 DOO 20-29 Female Zuid-Holland No No NaN GGD Hollands-Midden
3 2021-03-23 10:00:00 2020-01-01 DOO 60-69 Female Noord-Holland No No NaN GGD Hollands-Noorden
4 2021-03-23 10:00:00 2020-01-04 DOO 10-19 Female Gelderland Unknown No NaN GGD Gelderland-Zuid
5 2021-03-23 10:00:00 2020-01-06 DOO 30-39 Male Limburg Unknown Unknown NaN GGD Zuid-Limburg
6 2021-03-23 10:00:00 2020-01-16 DOO 0-9 Female Zuid-Holland No No NaN GGD Rotterdam-Rijnmond
7 2021-03-23 10:00:00 2020-01-20 DOO 50-59 Female Gelderland No No NaN GGD Gelderland-Zuid
8 2021-03-23 10:00:00 2020-01-20 DOO 0-9 Male Gelderland No No NaN GGD Gelderland-Zuid
9 2021-03-23 10:00:00 2020-01-22 DOO 80-89 Female Zuid-Holland Unknown Unknown NaN GGD Hollands-Midden
In [7]:
# df = pd.read_csv('/home/markn/my_project/corona/COVID-19_casus_landelijk.csv', sep=';', parse_dates=[0, 1], infer_datetime_format=True)
# df.info()
# from pandas.compat import StringIO

# df = pd.read_csv(StringIO(df1), parse_dates=[0, 1])

# df1.info()
df['Date_file'] = df['Date_file'].astype('datetime64[ns]')
df['Date_statistics'] = df['Date_statistics'].astype('datetime64[ns]')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1213366 entries, 0 to 1213365
Data columns (total 10 columns):
Date_file                   1213366 non-null datetime64[ns]
Date_statistics             1213366 non-null datetime64[ns]
Date_statistics_type        1213366 non-null object
Agegroup                    1213366 non-null object
Sex                         1213366 non-null object
Province                    1213366 non-null object
Hospital_admission          1213366 non-null object
Deceased                    1213366 non-null object
Week_of_death               16338 non-null float64
Municipal_health_service    1213366 non-null object
dtypes: datetime64[ns](2), float64(1), object(7)
memory usage: 92.6+ MB
In [8]:
df.count()
Out[8]:
Date_file                   1213366
Date_statistics             1213366
Date_statistics_type        1213366
Agegroup                    1213366
Sex                         1213366
Province                    1213366
Hospital_admission          1213366
Deceased                    1213366
Week_of_death                 16338
Municipal_health_service    1213366
dtype: int64
In [9]:
daterep = 'Date_statistics'
region  = 'Municipal_health_service'
cases   = 'cases'
deaths  = 'Deceased'
#we assign the difference collums for date,
lastdate = df[daterep].max() - pd.Timedelta('7 days')

df[cases]  = 1
df[deaths] = df[deaths].apply(lambda x: 1 if x == 'Yes' else 0)
df
Out[9]:
Date_file Date_statistics Date_statistics_type Agegroup Sex Province Hospital_admission Deceased Week_of_death Municipal_health_service cases
0 2021-03-23 10:00:00 2020-01-01 DOO 40-49 Female Noord-Holland No 0 NaN GGD Amsterdam 1
1 2021-03-23 10:00:00 2020-01-01 DOO 50-59 Male Gelderland No 0 NaN Veiligheids- en Gezondheidsregio Gelderland-Mi... 1
2 2021-03-23 10:00:00 2020-01-01 DOO 20-29 Female Zuid-Holland No 0 NaN GGD Hollands-Midden 1
3 2021-03-23 10:00:00 2020-01-01 DOO 60-69 Female Noord-Holland No 0 NaN GGD Hollands-Noorden 1
4 2021-03-23 10:00:00 2020-01-04 DOO 10-19 Female Gelderland Unknown 0 NaN GGD Gelderland-Zuid 1
5 2021-03-23 10:00:00 2020-01-06 DOO 30-39 Male Limburg Unknown 0 NaN GGD Zuid-Limburg 1
6 2021-03-23 10:00:00 2020-01-16 DOO 0-9 Female Zuid-Holland No 0 NaN GGD Rotterdam-Rijnmond 1
7 2021-03-23 10:00:00 2020-01-20 DOO 50-59 Female Gelderland No 0 NaN GGD Gelderland-Zuid 1
8 2021-03-23 10:00:00 2020-01-20 DOO 0-9 Male Gelderland No 0 NaN GGD Gelderland-Zuid 1
9 2021-03-23 10:00:00 2020-01-22 DOO 80-89 Female Zuid-Holland Unknown 0 NaN GGD Hollands-Midden 1
10 2021-03-23 10:00:00 2020-01-24 DOO 40-49 Male Limburg No 0 NaN GGD Zuid-Limburg 1
11 2021-03-23 10:00:00 2020-01-25 DOO 50-59 Male Drenthe No 0 NaN GGD Drenthe 1
12 2021-03-23 10:00:00 2020-01-26 DOO 50-59 Male Groningen No 0 NaN GGD Groningen 1
13 2021-03-23 10:00:00 2020-01-26 DOO 50-59 Male Noord-Brabant No 0 NaN GGD West-Brabant 1
14 2021-03-23 10:00:00 2020-01-27 DOO 50-59 Female Zuid-Holland No 0 NaN GGD Haaglanden 1
15 2021-03-23 10:00:00 2020-01-27 DOO 60-69 Male Gelderland No 0 NaN Veiligheids- en Gezondheidsregio Gelderland-Mi... 1
16 2021-03-23 10:00:00 2020-01-27 DOO 20-29 Male Groningen No 0 NaN GGD Groningen 1
17 2021-03-23 10:00:00 2020-01-29 DOO 80-89 Male Overijssel No 0 NaN GGD Regio Twente 1
18 2021-03-23 10:00:00 2020-01-31 DOO 80-89 Male Noord-Brabant Yes 1 202015.0 GGD Brabant-Zuidoost 1
19 2021-03-23 10:00:00 2020-01-31 DOO 90+ Female Noord-Holland No 0 NaN GGD Amsterdam 1
20 2021-03-23 10:00:00 2020-01-31 DOO 50-59 Male Overijssel No 0 NaN GGD Regio Twente 1
21 2021-03-23 10:00:00 2020-02-01 DOO 60-69 Female Limburg Yes 0 NaN GGD Limburg-Noord 1
22 2021-03-23 10:00:00 2020-02-01 DOO 60-69 Female Overijssel Yes 0 NaN GGD IJsselland 1
23 2021-03-23 10:00:00 2020-02-01 DOO 50-59 Female Zuid-Holland No 0 NaN GGD Hollands-Midden 1
24 2021-03-23 10:00:00 2020-02-03 DOO 50-59 Male Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
25 2021-03-23 10:00:00 2020-02-03 DOO 80-89 Female Overijssel Unknown 0 NaN GGD IJsselland 1
26 2021-03-23 10:00:00 2020-02-06 DOO 20-29 Female Noord-Brabant No 0 NaN GGD Brabant-Zuidoost 1
27 2021-03-23 10:00:00 2020-02-06 DOO 80-89 Male Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
28 2021-03-23 10:00:00 2020-02-07 DOO 70-79 Male Noord-Brabant No 0 NaN GGD Hart voor Brabant 1
29 2021-03-23 10:00:00 2020-02-07 DOO 70-79 Female Noord-Brabant Yes 0 NaN GGD Brabant-Zuidoost 1
... ... ... ... ... ... ... ... ... ... ... ...
1213336 2021-03-23 10:00:00 2021-03-23 DPL 10-19 Female Utrecht Unknown 0 NaN GGD Regio Utrecht 1
1213337 2021-03-23 10:00:00 2021-03-23 DPL 70-79 Male Zuid-Holland Unknown 0 NaN GGD Hollands-Midden 1
1213338 2021-03-23 10:00:00 2021-03-23 DPL 30-39 Male Noord-Holland Unknown 0 NaN GGD Kennemerland 1
1213339 2021-03-23 10:00:00 2021-03-23 DPL 20-29 Male Gelderland Unknown 0 NaN Veiligheids- en Gezondheidsregio Gelderland-Mi... 1
1213340 2021-03-23 10:00:00 2021-03-23 DPL 50-59 Female Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
1213341 2021-03-23 10:00:00 2021-03-23 DPL 70-79 Female Overijssel Unknown 0 NaN GGD IJsselland 1
1213342 2021-03-23 10:00:00 2021-03-23 DPL 60-69 Male Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
1213343 2021-03-23 10:00:00 2021-03-23 DPL 50-59 Male Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
1213344 2021-03-23 10:00:00 2021-03-23 DPL 60-69 Female Noord-Brabant Unknown 0 NaN GGD Brabant-Zuidoost 1
1213345 2021-03-23 10:00:00 2021-03-23 DON 50-59 Female Zuid-Holland Unknown 0 NaN GGD Haaglanden 1
1213346 2021-03-23 10:00:00 2021-03-23 DPL 20-29 Male Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
1213347 2021-03-23 10:00:00 2021-03-23 DON 60-69 Female Zuid-Holland Unknown 0 NaN GGD Hollands-Midden 1
1213348 2021-03-23 10:00:00 2021-03-23 DPL 60-69 Female Zuid-Holland Unknown 0 NaN Dienst Gezondheid & Jeugd ZHZ 1
1213349 2021-03-23 10:00:00 2021-03-23 DPL 50-59 Male Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
1213350 2021-03-23 10:00:00 2021-03-23 DPL 50-59 Male Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
1213351 2021-03-23 10:00:00 2021-03-23 DPL 50-59 Female Zuid-Holland Unknown 0 NaN GGD Rotterdam-Rijnmond 1
1213352 2021-03-23 10:00:00 2021-03-23 DPL 0-9 Female Zuid-Holland Unknown 0 NaN GGD Hollands-Midden 1
1213353 2021-03-23 10:00:00 2021-03-23 DPL 50-59 Female Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
1213354 2021-03-23 10:00:00 2021-03-23 DPL 10-19 Male Zuid-Holland Unknown 0 NaN GGD Haaglanden 1
1213355 2021-03-23 10:00:00 2021-03-23 DON 50-59 Female Noord-Holland Unknown 0 NaN GGD Gooi en Vechtstreek 1
1213356 2021-03-23 10:00:00 2021-03-23 DPL 50-59 Female Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
1213357 2021-03-23 10:00:00 2021-03-23 DPL 50-59 Male Zuid-Holland Unknown 0 NaN GGD Haaglanden 1
1213358 2021-03-23 10:00:00 2021-03-23 DPL 10-19 Female Zuid-Holland Unknown 0 NaN GGD Haaglanden 1
1213359 2021-03-23 10:00:00 2021-03-23 DPL 0-9 Female Utrecht Unknown 0 NaN GGD Regio Utrecht 1
1213360 2021-03-23 10:00:00 2021-03-23 DPL 40-49 Female Zuid-Holland Unknown 0 NaN GGD Hollands-Midden 1
1213361 2021-03-23 10:00:00 2021-03-23 DPL 60-69 Male Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
1213362 2021-03-23 10:00:00 2021-03-23 DPL 70-79 Female Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
1213363 2021-03-23 10:00:00 2021-03-23 DON 40-49 Female Zuid-Holland Unknown 0 NaN GGD Hollands-Midden 1
1213364 2021-03-23 10:00:00 2021-03-23 DPL 30-39 Female Noord-Brabant Unknown 0 NaN GGD Hart voor Brabant 1
1213365 2021-03-23 10:00:00 2021-03-23 DPL 10-19 Male Zuid-Holland Unknown 0 NaN Dienst Gezondheid & Jeugd ZHZ 1

1213366 rows × 11 columns

In [10]:
df_geo = df.pivot_table(index=daterep, columns=region, values=[cases, deaths], aggfunc='sum').fillna(0)
df_geo['cases']
Out[10]:
Municipal_health_service Dienst Gezondheid & Jeugd ZHZ GGD Amsterdam GGD Brabant-Zuidoost GGD Drenthe GGD Flevoland GGD Fryslân GGD Gelderland-Zuid GGD Gooi en Vechtstreek GGD Groningen GGD Haaglanden ... GGD Limburg-Noord GGD Noord- en Oost-Gelderland GGD Regio Twente GGD Regio Utrecht GGD Rotterdam-Rijnmond GGD West-Brabant GGD Zaanstreek/Waterland GGD Zeeland GGD Zuid-Limburg Veiligheids- en Gezondheidsregio Gelderland-Midden
Date_statistics
2020-01-01 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
2020-01-04 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-01-06 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
2020-01-16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
2020-01-20 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-01-22 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-01-24 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
2020-01-25 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-01-26 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
2020-01-27 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
2020-01-29 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-01-31 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-02-01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-02-03 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-02-06 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-02-07 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-02-10 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
2020-02-11 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-02-12 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-02-13 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-02-14 0.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-02-15 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-02-16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
2020-02-17 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-02-18 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-02-19 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
2020-02-20 0.0 2.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0
2020-02-21 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0
2020-02-22 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0
2020-02-23 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 3.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021-02-22 173.0 284.0 260.0 126.0 88.0 204.0 172.0 71.0 156.0 234.0 ... 170.0 159.0 134.0 383.0 403.0 174.0 155.0 89.0 112.0 127.0
2021-02-23 184.0 229.0 250.0 113.0 85.0 175.0 158.0 71.0 144.0 253.0 ... 146.0 155.0 139.0 374.0 368.0 153.0 139.0 97.0 92.0 91.0
2021-02-24 169.0 246.0 283.0 161.0 74.0 166.0 149.0 75.0 169.0 233.0 ... 159.0 202.0 120.0 335.0 361.0 130.0 138.0 96.0 110.0 130.0
2021-02-25 167.0 240.0 192.0 114.0 78.0 139.0 150.0 62.0 124.0 202.0 ... 154.0 140.0 117.0 292.0 352.0 150.0 149.0 91.0 105.0 85.0
2021-02-26 152.0 210.0 220.0 124.0 74.0 125.0 142.0 68.0 145.0 237.0 ... 130.0 146.0 107.0 297.0 343.0 123.0 144.0 82.0 103.0 86.0
2021-02-27 154.0 232.0 203.0 119.0 94.0 137.0 148.0 76.0 129.0 171.0 ... 156.0 151.0 102.0 291.0 320.0 143.0 124.0 75.0 103.0 106.0
2021-02-28 178.0 209.0 179.0 119.0 89.0 143.0 137.0 75.0 125.0 210.0 ... 154.0 140.0 82.0 300.0 307.0 120.0 114.0 83.0 83.0 99.0
2021-03-01 162.0 306.0 236.0 123.0 101.0 170.0 150.0 74.0 143.0 272.0 ... 177.0 189.0 88.0 300.0 437.0 152.0 152.0 108.0 93.0 122.0
2021-03-02 192.0 283.0 251.0 138.0 94.0 163.0 169.0 73.0 134.0 244.0 ... 177.0 196.0 112.0 309.0 361.0 147.0 139.0 91.0 98.0 144.0
2021-03-03 165.0 262.0 264.0 118.0 80.0 182.0 158.0 96.0 115.0 253.0 ... 177.0 187.0 113.0 299.0 385.0 173.0 142.0 112.0 124.0 119.0
2021-03-04 186.0 260.0 277.0 102.0 78.0 153.0 151.0 60.0 106.0 211.0 ... 143.0 159.0 95.0 263.0 379.0 165.0 135.0 113.0 95.0 105.0
2021-03-05 165.0 295.0 244.0 129.0 99.0 192.0 128.0 62.0 128.0 244.0 ... 164.0 171.0 134.0 329.0 428.0 147.0 157.0 115.0 100.0 139.0
2021-03-06 169.0 301.0 249.0 122.0 67.0 201.0 139.0 63.0 133.0 280.0 ... 165.0 194.0 138.0 330.0 413.0 171.0 128.0 101.0 83.0 132.0
2021-03-07 179.0 280.0 242.0 114.0 87.0 166.0 149.0 67.0 113.0 250.0 ... 173.0 197.0 131.0 361.0 427.0 143.0 115.0 111.0 106.0 152.0
2021-03-08 181.0 327.0 275.0 165.0 104.0 187.0 186.0 61.0 142.0 318.0 ... 157.0 259.0 133.0 425.0 547.0 168.0 164.0 143.0 118.0 166.0
2021-03-09 228.0 317.0 289.0 138.0 113.0 197.0 175.0 77.0 126.0 301.0 ... 201.0 249.0 164.0 454.0 501.0 169.0 178.0 143.0 121.0 154.0
2021-03-10 246.0 313.0 309.0 187.0 104.0 264.0 183.0 60.0 169.0 360.0 ... 217.0 243.0 132.0 475.0 521.0 192.0 169.0 126.0 151.0 149.0
2021-03-11 243.0 308.0 274.0 160.0 97.0 214.0 175.0 53.0 143.0 330.0 ... 203.0 261.0 139.0 348.0 531.0 185.0 157.0 134.0 129.0 138.0
2021-03-12 215.0 305.0 293.0 170.0 104.0 214.0 212.0 57.0 146.0 330.0 ... 206.0 248.0 148.0 407.0 537.0 204.0 151.0 149.0 127.0 145.0
2021-03-13 225.0 298.0 337.0 163.0 110.0 216.0 201.0 60.0 140.0 366.0 ... 223.0 256.0 133.0 367.0 520.0 211.0 131.0 122.0 117.0 141.0
2021-03-14 214.0 268.0 251.0 144.0 93.0 171.0 177.0 50.0 130.0 330.0 ... 200.0 221.0 115.0 383.0 494.0 170.0 147.0 134.0 118.0 161.0
2021-03-15 297.0 359.0 283.0 177.0 96.0 244.0 189.0 67.0 160.0 344.0 ... 230.0 311.0 151.0 411.0 565.0 214.0 203.0 154.0 147.0 160.0
2021-03-16 192.0 313.0 288.0 222.0 132.0 241.0 203.0 63.0 154.0 401.0 ... 238.0 313.0 129.0 450.0 524.0 225.0 161.0 127.0 152.0 161.0
2021-03-17 223.0 347.0 305.0 177.0 88.0 249.0 188.0 60.0 150.0 344.0 ... 222.0 293.0 150.0 467.0 506.0 247.0 186.0 138.0 170.0 199.0
2021-03-18 209.0 305.0 254.0 151.0 91.0 160.0 161.0 71.0 128.0 318.0 ... 226.0 249.0 136.0 380.0 510.0 217.0 173.0 137.0 155.0 146.0
2021-03-19 209.0 256.0 265.0 142.0 91.0 176.0 150.0 59.0 93.0 316.0 ... 190.0 214.0 115.0 340.0 349.0 158.0 142.0 103.0 139.0 107.0
2021-03-20 166.0 160.0 250.0 92.0 45.0 120.0 125.0 38.0 53.0 233.0 ... 133.0 156.0 91.0 299.0 259.0 117.0 109.0 65.0 81.0 102.0
2021-03-21 214.0 111.0 262.0 61.0 35.0 63.0 85.0 29.0 46.0 141.0 ... 85.0 112.0 45.0 225.0 465.0 60.0 76.0 65.0 58.0 60.0
2021-03-22 123.0 148.0 138.0 38.0 22.0 38.0 126.0 11.0 90.0 141.0 ... 23.0 189.0 33.0 206.0 329.0 23.0 89.0 14.0 57.0 107.0
2021-03-23 32.0 11.0 14.0 1.0 0.0 1.0 42.0 1.0 16.0 19.0 ... 0.0 53.0 21.0 50.0 12.0 0.0 58.0 0.0 16.0 26.0

424 rows × 25 columns

In [11]:
new_index = pd.date_range(df_geo.index.min(), df_geo.index.max() + pd.Timedelta('365 days'))
df_geo = df_geo.reindex(new_index)
df_geo
Out[11]:
Deceased ... cases
Municipal_health_service Dienst Gezondheid & Jeugd ZHZ GGD Amsterdam GGD Brabant-Zuidoost GGD Drenthe GGD Flevoland GGD Fryslân GGD Gelderland-Zuid GGD Gooi en Vechtstreek GGD Groningen GGD Haaglanden ... GGD Limburg-Noord GGD Noord- en Oost-Gelderland GGD Regio Twente GGD Regio Utrecht GGD Rotterdam-Rijnmond GGD West-Brabant GGD Zaanstreek/Waterland GGD Zeeland GGD Zuid-Limburg Veiligheids- en Gezondheidsregio Gelderland-Midden
2020-01-01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
2020-01-02 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-04 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-01-05 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-06 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
2020-01-07 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-08 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-09 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-10 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-11 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-12 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-13 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-14 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-15 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
2020-01-17 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-19 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-20 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-01-21 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-22 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-01-23 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-24 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
2020-01-25 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-01-26 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
2020-01-27 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
2020-01-28 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-29 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2020-01-30 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2022-02-22 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-02-23 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-02-24 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-02-25 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-02-26 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-02-27 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-02-28 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-02 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-04 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-05 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-06 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-07 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-08 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-09 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-10 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-11 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-12 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-13 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-14 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-15 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-16 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-17 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-19 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-20 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-21 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-22 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2022-03-23 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

813 rows × 50 columns

In [12]:
df_geo['daynum'] = (df_geo.index - df_geo.index.min()).days
df_geo['daynum'].describe()
Out[12]:
count    813.000000
mean     406.000000
std      234.837178
min        0.000000
25%      203.000000
50%      406.000000
75%      609.000000
max      812.000000
Name: daynum, dtype: float64
In [13]:
def gumpdf(x, beta, mu):
    """Return PDF value according to Gumbel"""
    expon = - ((x - mu) / beta)
    return(np.exp(expon) * np.exp(- (np.exp(expon))) / beta)

def gumcdf(x, beta, mu):
    """Return CDF value according to Gumbel"""
    expon = - ((x - mu) / beta)
    return(np.exp(- (np.exp(expon))))
In [14]:
import matplotlib as mpl

mpl.rc('figure', max_open_warning = 0)
In [15]:
regions = np.sort(df[region].unique())
ind_pos = [2,1,4,8]
regions1 = regions[ind_pos]
## Choose all 
# regions1 = regions
regions
# regions1
Out[15]:
array(['Dienst Gezondheid & Jeugd ZHZ', 'GGD Amsterdam',
       'GGD Brabant-Zuidoost', 'GGD Drenthe', 'GGD Flevoland',
       'GGD Fryslân', 'GGD Gelderland-Zuid', 'GGD Gooi en Vechtstreek',
       'GGD Groningen', 'GGD Haaglanden', 'GGD Hart voor Brabant',
       'GGD Hollands-Midden', 'GGD Hollands-Noorden', 'GGD IJsselland',
       'GGD Kennemerland', 'GGD Limburg-Noord',
       'GGD Noord- en Oost-Gelderland', 'GGD Regio Twente',
       'GGD Regio Utrecht', 'GGD Rotterdam-Rijnmond', 'GGD West-Brabant',
       'GGD Zaanstreek/Waterland', 'GGD Zeeland', 'GGD Zuid-Limburg',
       'Veiligheids- en Gezondheidsregio Gelderland-Midden'], dtype=object)

Corona wave level of each area.

In [16]:
# Select regions to fit.
regions = regions1

# Choose whether to output plots per region.
showplots = True

# region  = 'GGD Brabant Zuid-Oost' and some order important area, i think this is good to choose Brabant-Zuid-Oost
# We will try to measure the case which going up and down and idenfity where 
measure  = cases
smeasure = 'Week window' # smoothed
rmeasure = 'rcases'      # remaining
pmeasure = 'Model'       # predicted wave
wmeasure = 'Wave '       # waves
### This one i made the measure for case and week windown to show the smoothed, remaining, ... of corona ware. 
for region in regions:
### build the region in region. 
    wave = 1
###Apply the variable 
    df_geo[(pmeasure, region)] = 0
    df_geo[(smeasure, region)] = df_geo[measure][region].loc[:lastdate].rolling(7).mean()
    df_geo[(rmeasure, region)] = df_geo[smeasure][region]

    plotlist = [(smeasure, region), (pmeasure, region)]

    #countryname = df[df['geoId'] == country]['countriesAndTerritories'].iloc[0]
    #popdata = df[df['geoId'] == country]['popData2019'].iloc[0]

    #mincases = popdata / 1e6
    mincases = 2
    #mincases = df_geo[smeasure][country].sum() / 5000
    #mincases = max(popdata / 1e6, 10)

    print('Running multiple wave analysis for \'{}\''.format(region))
    print('Minimum number of cases is {:1.0f}'.format(mincases))
## Show the line of wave analysis, i got help as reference from outsite. 
    while True:
        curwave = wmeasure + str((wave) + 1000)[-2:]
        df_geo[(curwave, region)] = 0

        df_pred = pd.DataFrame({'daynum':df_geo['daynum'],
                                measure:df_geo[rmeasure][region]})

        df_pred['gumdiv'] = df_pred[measure] / df_pred[measure].cumsum()
        df_pred = df_pred[(df_pred['gumdiv'] > 0) & (df_pred[measure] > mincases)]

        df_pred['linear'] = np.log(df_pred['gumdiv'])

        df_pred = df_pred[(df_pred['linear'] < -0.5) &
                          (df_pred['linear'] > -3.5)]

        if len(df_pred) <= 1:
            print('--- no data left')
            break
### Luckly not so much data left, it already good data occurrence per day. 
        eax = df_pred['daynum'].values.reshape(-1, 1)
        eay = df_pred['linear'].values.reshape(-1, 1)

        #eamodel = Earth()
        #eamodel = Earth(minspan=0)
        eamodel = Earth(minspan=1, penalty=0, endspan=0, thresh=1e-9, check_every=1)
        eamodel.fit(eax, eay)

        df_pred['earth'] = eamodel.predict(eax)
## Day min is the minumum date function min() help us choose the day. 
## Day max is the max date function max() help us choose the top day. 
        daymin = df_pred['daynum'].min()
        daymax = df_pred['daynum'].max()

        #df_pred['gbgrad'] = np.gradient(df_pred['linear'])
        #df_pred['eagrad'] = np.gradient(df_pred['earth'])
    #In df_pred['linear'] is suitable more on the graph. You can see the gradient too. 
        df_pred['gbgrad'] = df_pred['linear'] - df_pred['linear'].shift(1)
        df_pred['eagrad'] = df_pred['earth'] - df_pred['earth'].shift(1)

        fitmod = export.export_python_function(eamodel)
## Apply export python_function(eamodel)
        df_pred['knot'] = ((abs(df_pred['eagrad'] - df_pred['eagrad'].shift(1)) > 1e-6) |
                           (df_pred['daynum'] == (daymin + 1)) |
                           (df_pred['daynum'] == daymax))
        df_pred['daycount'] = df_pred.reset_index().index

        df_knot = df_pred[df_pred['knot']][['daynum', 'daycount', 'eagrad']]
        df_knot['daysdata'] = df_knot['daycount'].shift(-1) - df_knot['daycount']
        df_knot['daystime'] = df_knot['daynum'].shift(-1) - df_knot['daynum']

        df_knot['cand'] = ((df_knot['eagrad'] < -1/77) &
                           (df_knot['daysdata'] >= 3))

        df_knot['since'] = df_knot['daynum'] - daymin
        df_knot['score'] = (df_knot['eagrad'] ** 2) * np.sqrt(df_knot['daysdata'] / np.sqrt(df_knot['since']))
        df_knot['choice'] = df_knot['score'] == df_knot[df_knot['cand']]['score'].max()

        choice = df_knot[df_knot['choice']]
        if len(choice) == 0:
            print('--- no data for wave')
            break

        lower = choice['daynum'].values[0]
        upper = choice['daysdata'].values[0] + lower

        df_pred = df_pred[(df_pred['daynum'] >= lower) &
                          (df_pred['daynum'] <= upper)].copy()

        slope = (fitmod([[upper]])[0] - fitmod([[lower]])[0]) / (upper - lower)
        intercept = fitmod([[lower]])[0] - (lower * slope)
### Slope does matter because they are more linear regression in here. 
        beta = - 1 / slope
        mu = beta * (intercept + np.log(beta))

        df_pred['pgumb'] = gumpdf(df_pred['daynum'], beta, mu)
        df_pred['scale'] = df_pred[measure] / df_pred['pgumb']

        final = df_pred['scale'].mean()
        fincv = df_pred['scale'].std() / final
### Findcv 
### Final.std() it is the goal to know cv of standarize / final. 
        df_geo[(curwave, region)] = final * gumpdf(df_geo['daynum'], beta, mu)

        peak = df_geo[df_geo[(curwave, region)] == df_geo[(curwave, region)].max()].index.min()
        start = df_geo[(df_geo[(curwave, region)] >= 1) &
                       (df_geo[(curwave, region)].index < peak)].index.min()
        floor = df_geo[(df_geo[(curwave, region)] < 1) &
                       (df_geo[(curwave, region)].index > peak)].index.min()
### Beta wave and MU wave fit the peak from difference size. 
### I assigned the wave, mu, betta to reac the beak of start date. 
        print('{} beta {:6.3f} mu {:3.0f} fit {:5.3f} peak {} from {} to {} size {:1.0f}'.format(
            curwave, beta, mu, (1 - fincv) ** 2, peak.date(), start.date(), floor.date(), final))

        df_geo[(pmeasure, region)] += df_geo[(curwave, region)]
        df_geo[(rmeasure, region)] -= df_geo[(curwave, region)]
        plotlist += [(curwave, region)]
        wave += 1

    if showplots:
        df_geo[plotlist].loc['20200101':'20210404'].plot(
            figsize=(16, 9),
            grid=True,
            kind='area',
            stacked=False,
            alpha=1/3,
            title='Daily new cases for '+region, 
            colormap="Dark2")

        df_geo[plotlist].loc['20200101':'20210404'].cumsum().plot(
            figsize=(16, 9),
            grid=True,
            kind='area',
            stacked=False,
            alpha=1/3,
            title='Cumulative cases for '+region, 
            colormap="magma")
Running multiple wave analysis for 'GGD Brabant-Zuidoost'
Minimum number of cases is 2
Wave 01 beta  7.919 mu  69 fit 0.927 peak 2020-03-10 from 2020-02-27 to 2020-04-07 size 259
Wave 02 beta  6.819 mu  78 fit 0.864 peak 2020-03-19 from 2020-03-06 to 2020-04-20 size 773
Wave 03 beta  7.417 mu  90 fit 0.874 peak 2020-03-31 from 2020-03-17 to 2020-05-05 size 865
Wave 04 beta  8.746 mu 105 fit 0.821 peak 2020-04-15 from 2020-03-31 to 2020-05-24 size 682
Wave 05 beta  3.353 mu 123 fit 0.812 peak 2020-05-03 from 2020-04-29 to 2020-05-11 size 33
Wave 06 beta  4.451 mu 135 fit 0.697 peak 2020-05-15 from 2020-05-10 to 2020-05-27 size 58
Wave 07 beta  8.234 mu 154 fit 0.868 peak 2020-06-03 from 2020-05-21 to 2020-06-30 size 218
Wave 08 beta  3.750 mu 166 fit 0.652 peak 2020-06-15 from 2020-06-11 to 2020-06-24 size 41
Wave 09 beta  4.944 mu 197 fit 0.854 peak 2020-07-16 from 2020-07-10 to 2020-07-27 size 46
Wave 10 beta 11.598 mu 220 fit 0.821 peak 2020-08-08 from 2020-07-21 to 2020-09-18 size 385
Wave 11 beta  8.747 mu 236 fit 0.794 peak 2020-08-24 from 2020-08-10 to 2020-09-24 size 308
Wave 12 beta 13.351 mu 257 fit 0.893 peak 2020-09-14 from 2020-08-22 to 2020-11-13 size 1103
Wave 13 beta 15.392 mu 277 fit 0.850 peak 2020-10-04 from 2020-09-03 to 2021-01-02 size 4926
Wave 14 beta  8.683 mu 284 fit 0.907 peak 2020-10-11 from 2020-09-23 to 2020-12-04 size 4201
Wave 15 beta  5.779 mu 292 fit 0.945 peak 2020-10-19 from 2020-10-08 to 2020-11-24 size 2329
Wave 16 beta  6.224 mu 301 fit 0.936 peak 2020-10-28 from 2020-10-15 to 2020-12-08 size 4176
Wave 17 beta  6.422 mu 313 fit 0.843 peak 2020-11-09 from 2020-10-27 to 2020-12-16 size 1876
Wave 18 beta  8.065 mu 326 fit 0.915 peak 2020-11-22 from 2020-11-05 to 2021-01-12 size 4318
Wave 19 beta  6.526 mu 342 fit 0.913 peak 2020-12-08 from 2020-11-25 to 2021-01-19 size 3920
Wave 20 beta  6.256 mu 351 fit 0.940 peak 2020-12-17 from 2020-12-03 to 2021-01-28 size 4951
Wave 21 beta  4.684 mu 356 fit 0.956 peak 2020-12-22 from 2020-12-13 to 2021-01-21 size 2340
Wave 22 beta  6.345 mu 368 fit 0.909 peak 2021-01-03 from 2020-12-21 to 2021-02-14 size 3856
Wave 23 beta  4.995 mu 376 fit 0.933 peak 2021-01-11 from 2021-01-01 to 2021-02-11 size 2136
Wave 24 beta  6.300 mu 386 fit 0.925 peak 2021-01-21 from 2021-01-09 to 2021-03-03 size 3471
Wave 25 beta  6.607 mu 396 fit 0.871 peak 2021-01-31 from 2021-01-17 to 2021-03-12 size 2684
Wave 26 beta  5.915 mu 407 fit 0.907 peak 2021-02-11 from 2021-01-31 to 2021-03-18 size 1905
Wave 27 beta  5.521 mu 417 fit 0.942 peak 2021-02-21 from 2021-02-10 to 2021-03-27 size 2367
Wave 28 beta  5.133 mu 423 fit 0.934 peak 2021-02-27 from 2021-02-17 to 2021-03-29 size 1456
Wave 29 beta  4.914 mu 432 fit 0.905 peak 2021-03-08 from 2021-02-26 to 2021-04-07 size 2100
Wave 30 beta  5.908 mu 441 fit 0.960 peak 2021-03-17 from 2021-03-05 to 2021-04-24 size 3191
--- no data left
Running multiple wave analysis for 'GGD Amsterdam'
Minimum number of cases is 2
Wave 01 beta 13.770 mu  85 fit 0.918 peak 2020-03-26 from 2020-02-28 to 2020-06-04 size 2256
Wave 02 beta  4.793 mu  99 fit 0.868 peak 2020-04-09 from 2020-04-02 to 2020-04-27 size 178
Wave 03 beta  4.732 mu 107 fit 0.832 peak 2020-04-17 from 2020-04-10 to 2020-05-04 size 176
Wave 04 beta  5.098 mu 119 fit 0.874 peak 2020-04-29 from 2020-04-21 to 2020-05-17 size 174
Wave 05 beta  6.661 mu 132 fit 0.679 peak 2020-05-12 from 2020-05-03 to 2020-06-01 size 118
Wave 06 beta  7.098 mu 150 fit 0.791 peak 2020-05-30 from 2020-05-20 to 2020-06-21 size 144
Wave 07 beta  7.235 mu 163 fit 0.851 peak 2020-06-12 from 2020-06-01 to 2020-07-06 size 199
Wave 08 beta  4.769 mu 179 fit 0.776 peak 2020-06-28 from 2020-06-23 to 2020-07-09 size 48
Wave 09 beta 15.113 mu 212 fit 0.930 peak 2020-07-31 from 2020-07-03 to 2020-10-08 size 1448
Wave 10 beta  9.501 mu 222 fit 0.838 peak 2020-08-10 from 2020-07-22 to 2020-10-02 size 2315
Wave 11 beta 12.760 mu 250 fit 0.848 peak 2020-09-07 from 2020-08-13 to 2020-11-15 size 2736
Wave 12 beta 11.416 mu 262 fit 0.899 peak 2020-09-19 from 2020-08-26 to 2020-12-03 size 7493
Wave 13 beta  5.454 mu 268 fit 0.920 peak 2020-09-25 from 2020-09-14 to 2020-10-28 size 2077
Wave 14 beta  6.505 mu 280 fit 0.907 peak 2020-10-07 from 2020-09-23 to 2020-11-22 size 7037
Wave 15 beta  9.158 mu 296 fit 0.901 peak 2020-10-23 from 2020-10-02 to 2020-12-30 size 15578
Wave 16 beta  5.684 mu 318 fit 0.944 peak 2020-11-14 from 2020-11-02 to 2020-12-21 size 3318
Wave 17 beta  5.683 mu 328 fit 0.893 peak 2020-11-24 from 2020-11-13 to 2020-12-29 size 2503
Wave 18 beta  7.887 mu 342 fit 0.931 peak 2020-12-08 from 2020-11-21 to 2021-01-31 size 6574
Wave 19 beta  9.609 mu 356 fit 0.954 peak 2020-12-22 from 2020-12-01 to 2021-02-27 size 9944
Wave 20 beta  6.777 mu 376 fit 0.879 peak 2021-01-11 from 2020-12-28 to 2021-02-22 size 3183
--- no data for wave
Running multiple wave analysis for 'GGD Flevoland'
Minimum number of cases is 2
Wave 01 beta 17.624 mu  92 fit 0.771 peak 2020-04-02 from 2020-03-04 to 2020-06-09 size 828
Wave 02 beta  3.953 mu 101 fit 0.575 peak 2020-04-11 from 2020-04-06 to 2020-04-22 size 55
Wave 03 beta  7.266 mu 127 fit 0.855 peak 2020-05-07 from 2020-04-28 to 2020-05-23 size 74
Wave 04 beta 25.220 mu 238 fit 0.778 peak 2020-08-26 from 2020-07-16 to 2020-11-23 size 892
Wave 05 beta  7.949 mu 256 fit 0.826 peak 2020-09-13 from 2020-08-30 to 2020-10-17 size 529
Wave 06 beta  7.997 mu 270 fit 0.802 peak 2020-09-27 from 2020-09-13 to 2020-10-31 size 542
Wave 07 beta  9.915 mu 285 fit 0.844 peak 2020-10-12 from 2020-09-22 to 2020-12-08 size 3010
Wave 08 beta  6.610 mu 297 fit 0.897 peak 2020-10-24 from 2020-10-11 to 2020-11-28 size 1313
Wave 09 beta  5.624 mu 304 fit 0.918 peak 2020-10-31 from 2020-10-21 to 2020-11-30 size 1000
Wave 10 beta 10.172 mu 324 fit 0.923 peak 2020-11-20 from 2020-10-30 to 2021-01-21 size 4359
Wave 11 beta 13.345 mu 353 fit 0.886 peak 2020-12-19 from 2020-11-20 to 2021-03-19 size 10696
Wave 12 beta 26.316 mu 405 fit 0.700 peak 2021-02-09 from 2020-12-20 to 2021-06-21 size 3942
Wave 13 beta 13.233 mu 437 fit 0.786 peak 2021-03-13 from 2021-02-16 to 2021-05-21 size 2306
--- no data for wave
Running multiple wave analysis for 'GGD Groningen'
Minimum number of cases is 2
Wave 01 beta  7.268 mu  78 fit 0.876 peak 2020-03-19 from 2020-03-08 to 2020-04-14 size 252
Wave 02 beta  5.943 mu  99 fit 0.779 peak 2020-04-09 from 2020-04-01 to 2020-04-24 size 77
Wave 03 beta 38.567 mu 302 fit 0.868 peak 2020-10-29 from 2020-08-09 to 2021-06-19 size 16336
Wave 04 beta  4.407 mu 360 fit 0.790 peak 2020-12-26 from 2020-12-18 to 2021-01-23 size 1909
Wave 05 beta  4.813 mu 370 fit 0.853 peak 2021-01-05 from 2020-12-27 to 2021-02-02 size 1373
Wave 06 beta  7.857 mu 385 fit 0.815 peak 2021-01-20 from 2021-01-04 to 2021-03-06 size 2261
Wave 07 beta 14.113 mu 419 fit 0.907 peak 2021-02-23 from 2021-01-25 to 2021-05-18 size 5143
Wave 08 beta  5.520 mu 440 fit 0.833 peak 2021-03-16 from 2021-03-06 to 2021-04-14 size 961
Wave 09 beta 37.042 mu 454 fit 0.166 peak 2021-03-30 from 2021-01-08 to 2021-12-15 size 40795
--- no data for wave

Show the case per week, as standard, and we can see in April the minimal viral of corona cases are micro-level. It was starting to skyrocket during the case of 2021 Jan. Coming up with April, since the Summer vibe is reaching according to the Dataset, it is getting bombard in this time week 12 and week 12. There we should be doing the probability way to understand the situation properly to understand how the circumstance is working currently.

In [17]:
# !wget -N https://data.rivm.nl/covid-19/COVID-19_casus_landelijk.csv
In [18]:
import pandas as pd
import numpy  as np

df_case = pd.read_csv(
    'COVID-19_casus_landelijk.csv',
    sep=';',
    parse_dates=[0, 1],
    infer_datetime_format=True)

df_case.tail(10)
## input by pandas
# df_case_spark = spark.read.options(header='True', inferSchema='True', delimiter=';').csv('COVID-19_casus_landelijk.csv')
# # df_case_spark.show(10)
# from pyspark.sql.types import *

# df_case_spark = df_case_spark.withColumn("Date_file",
# df_case_spark["Date_file"].cast(DateType()))

# df_case_spark = df_case_spark.withColumn("Date_statistics",
# df_case_spark["Date_statistics"].cast(DateType()))
df_case.info()
# df_case_spark.printSchema()
## As a big data input by spark
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1236209 entries, 0 to 1236208
Data columns (total 10 columns):
Date_file                   1236209 non-null datetime64[ns]
Date_statistics             1236209 non-null datetime64[ns]
Date_statistics_type        1236209 non-null object
Agegroup                    1236209 non-null object
Sex                         1236209 non-null object
Province                    1236209 non-null object
Hospital_admission          1236209 non-null object
Deceased                    1236209 non-null object
Week_of_death               16420 non-null float64
Municipal_health_service    1236209 non-null object
dtypes: datetime64[ns](2), float64(1), object(7)
memory usage: 94.3+ MB
In [19]:
# df_case['period'] = df_case['Date_statistics'].apply(lambda x: x.isocalendar()[1])* 100 + apply(lambda x: x.isocalendar()[1])
# df_case['period']
df_case['period'] = df_case['Date_statistics'].apply(lambda x: x.isocalendar()[0])* 100 + df_case['Date_statistics'].apply(lambda x: x.isocalendar()[1])
df_case['period'] = df_case['period'].apply(str)
df_case['period'] = df_case['period'].apply(lambda x: x[: 4] + 'W' + x[4:])

df_case['group'] = df_case['Sex'] + ' ' + df_case['Agegroup']

df_case.tail(10)
Out[19]:
Date_file Date_statistics Date_statistics_type Agegroup Sex Province Hospital_admission Deceased Week_of_death Municipal_health_service period group
1236199 2021-03-26 10:00:00 2021-03-26 DPL 0-9 Male Zuid-Holland Unknown Unknown NaN Dienst Gezondheid & Jeugd ZHZ 2021W12 Male 0-9
1236200 2021-03-26 10:00:00 2021-03-26 DPL 20-29 Male Fryslân Unknown Unknown NaN GGD Groningen 2021W12 Male 20-29
1236201 2021-03-26 10:00:00 2021-03-26 DON 20-29 Male Zuid-Holland Unknown Unknown NaN GGD Haaglanden 2021W12 Male 20-29
1236202 2021-03-26 10:00:00 2021-03-26 DPL 60-69 Male Gelderland Unknown Unknown NaN Veiligheids- en Gezondheidsregio Gelderland-Mi... 2021W12 Male 60-69
1236203 2021-03-26 10:00:00 2021-03-26 DPL 50-59 Female Gelderland Unknown Unknown NaN GGD Noord- en Oost-Gelderland 2021W12 Female 50-59
1236204 2021-03-26 10:00:00 2021-03-26 DON 30-39 Female Gelderland Unknown Unknown NaN Veiligheids- en Gezondheidsregio Gelderland-Mi... 2021W12 Female 30-39
1236205 2021-03-26 10:00:00 2021-03-26 DPL 40-49 Male Noord-Holland Unknown Unknown NaN GGD Amsterdam 2021W12 Male 40-49
1236206 2021-03-26 10:00:00 2021-03-26 DPL 80-89 Male Zuid-Holland Unknown Unknown NaN GGD Hollands-Midden 2021W12 Male 80-89
1236207 2021-03-26 10:00:00 2021-03-26 DPL 60-69 Male Zuid-Holland Unknown Unknown NaN GGD Rotterdam-Rijnmond 2021W12 Male 60-69
1236208 2021-03-26 10:00:00 2021-03-26 DPL 20-29 Female Gelderland Unknown Unknown NaN GGD Noord- en Oost-Gelderland 2021W12 Female 20-29
In [20]:
data_df_heat_map = df_case[df_case['Municipal_health_service'] != ''].pivot_table(
    index='period', 
    columns='group', 
    values='Date_statistics', 
    aggfunc='count').fillna(0)

# Select columns to use, optionally subset or use relative numbers
#data_df_heat_map['total'] = data_df_heat_map[data_df_heat_map.columns[0:24]].sum(axis=1)
data_df_heat_map = data_df_heat_map[data_df_heat_map.columns[0:24]]

# setting on the relatie growth numbers
# data_df_heat_map = data_df_heat_map / data_df_heat_map.shift() 

data_df_heat_map.tail(5).loc[::-1].transpose() 
Out[20]:
period 2021W12 2021W11 2021W10 2021W09 2021W08
group
Female 0-9 696.0 1602.0 1466.0 1170.0 1050.0
Female 10-19 1382.0 3366.0 2882.0 2246.0 2077.0
Female 20-29 1411.0 3676.0 3346.0 2940.0 2581.0
Female 30-39 1250.0 3415.0 3129.0 2532.0 2431.0
Female 40-49 1248.0 3370.0 3040.0 2406.0 2252.0
Female 50-59 1167.0 3216.0 3122.0 2577.0 2367.0
Female 60-69 582.0 1755.0 1740.0 1376.0 1437.0
Female 70-79 381.0 1034.0 975.0 818.0 877.0
Female 80-89 142.0 354.0 439.0 421.0 548.0
Female 90+ 46.0 87.0 125.0 120.0 179.0
Female <50 0.0 0.0 1.0 1.0 0.0
Female Unknown 0.0 0.0 0.0 0.0 0.0
Male 0-9 817.0 1803.0 1554.0 1253.0 1086.0
Male 10-19 1468.0 3335.0 3078.0 2233.0 1976.0
Male 20-29 1432.0 3596.0 3427.0 2826.0 2489.0
Male 30-39 1161.0 3043.0 2733.0 2304.0 2146.0
Male 40-49 1199.0 3174.0 2972.0 2256.0 2166.0
Male 50-59 1255.0 3408.0 3377.0 2664.0 2432.0
Male 60-69 691.0 2060.0 1908.0 1636.0 1598.0
Male 70-79 360.0 993.0 967.0 836.0 835.0
Male 80-89 132.0 298.0 345.0 302.0 364.0
Male 90+ 24.0 44.0 48.0 48.0 77.0
Male <50 0.0 0.0 1.0 0.0 0.0
Male Unknown 0.0 0.0 0.0 0.0 0.0

HeatMap for Corona Time Series on weekly.

According to males and females, this plot depends on the week to week number of cases getting increase.

In [21]:
## Define array of row and columns headers 
durationsweekly = data_df_heat_map.index 
agegroups = data_df_heat_map.columns 

## Output size to modified with data size and length 
fig, ax = plt.subplots(figsize=(26,12))

heatmap = plt.imshow(
    np.log(data_df_heat_map[data_df_heat_map > 0].loc[:].transpose()), 
    cmap='Accent', 
    interpolation='None', 
    aspect='auto', 
    origin='lower')

# Value add to be axis tick label
ax.set_xticks(np.arange(len(durationsweekly)))
ax.set_yticks(np.arange(len(agegroups)))

ax.set_xticklabels(durationsweekly)
ax.set_yticklabels(agegroups)

# X labels diagonally
plt.setp(
    ax.get_xticklabels(), 
    rotation=45, 
    ha="right", 
    rotation_mode="anchor")

# Convert dataframe to numpy dataframe
np_heat = data_df_heat_map.to_numpy() 

# Set numbers as text labeles 
for i in range(len(durationsweekly)): 
    for j in range(len(agegroups)): 
        text = ax.text(
            i, 
            j, 
            int(np_heat[i, j]), 
            ha="center", 
            va="center", 
            color="w")
            
ax.set_title("Positive tests weekly, on sex and age group")
fig.tight_layout() 
plt.show() 

We can see highly many case has been raising from 2021 when it has stablished.

Convert to real plotly heat map.

This perosnally support the project to know exactly for every tiny cell work actively show the number. And by heatmap it able to show the impact of cases.

In [22]:
import plotly.graph_objects as go
import plotly.express as px
import numpy as np

#can be consider duration as period time. 
durationsweekly = data_df_heat_map.index 
agegroups = data_df_heat_map.columns 

# df = data_df_heat_map
fig = px.imshow(np.log(data_df_heat_map[data_df_heat_map > 0].loc[:].transpose()))

# Convert dataframe to numpy dataframe
np_heat = data_df_heat_map.to_numpy() 

# Set numbers as text labeles 
for i in range(len(durationsweekly)): 
    for j in range(len(agegroups)): 
        text = ax.text(
            i, 
            j, 
            int(np_heat[i, j]), 
            ha="center", 
            va="center", 
            color="w")
        

# fig['layout']['yaxis']['autorange'] = "reversed"
# iplot(fig)

fig.show()
## Plot the file heat map into this real case. 

2021 have extreamly case within the top peak of aleart in NL. As we can see how the heat raise to currenly time.

Result Of This NoteBook

We can see the density of new case happened from 2021W04 to 2021W12 they are decrease compare with Week14. The more the pandemic lasts, the more patients are getting bigger. Therefore we can see massive points during the new year. Even though some of the corona measures already start and now some restrictions started in NL. In the opposite way of research data analytic EDA, we need a good insight into how action worldwide is working right now with the measurement method in next notebook. Dus, we can know which one is going to be suited with NL level. After that, some map and other work can be implemented into this graph.

  • This one as personal, is going public in Github as Mark Version 1.0. Some changes will apply through other contributions in the future to bring the best level of AI to this Corona Measures WorkNotebook.